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ABSTRACT 


In this note we give a necessary and sufficient condition for the 
Gateaux differentiability of the probability of misclassif ication as a function 
of a feature selection matrix B, assuming a maximum likelihood classifier 
and normally distributed populations* It is also shown that if the probability 
of error has a local minimum at B then it is differentiable at B. 
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On. Differentiating the Probability of Error in 
the Multipopulation Feature Selection Problem, n. 

1. Introduction. 

Let ...» it be populations in R n with, a priori probabilities 

a and multivariate normal conditional density functions, 

p. (x) = — - exp [- 4(x-y.) T i:. 1 (x-u.)]. 

( 2 T )” /2 | El | 1/2 

i = 1, m. If B is a kxn matrix of rank k then the transformed 

k 

conditional densities are, for y e R , 


P ± (y,B) 


(2TT) k ^ 2 |BE i B 


YY/2 ex Pt“ |'(y-Bu i ) T (BE 1 B T ) 1 (y-Bu i )]. 


Let g(B) denote the probability of misclassifying an observation 
x e R n using the Bayes optimal classifier: classify x in if 

OL P ± (Bx, B) £ 0^ Pj (Bx, B) for each j = 1, m. Then g(B) = 1 - h(B), 

where 


h(B) = J max ot.P. (y,B)dy. 
J , l<i<m 1 
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is the probability of correct classification* 

If the transformed probability of error is to be used as a feature 
selection criterion we require a method for obtaining a kxn matrix 
of rank k which minimizes g(B). If minimizes g(B) then the 

Gateaux differential, I2,p.l78], 


Sg (B 6 ;C) 


lim 

s+o 


g(B Q +sC) - g(B o ) 
s 


vanishes for all kxn matrices C for which it exists. If 6g(B :C) 

o 

exists for all kxn matrices C, then g is said to he Gateaux differentiable 
at B q . Thus it is desireable to have necessary and sufficient conditions for 
Gateaux differentiability of g as well as a formula for 6g(B;C). 

2. Main Results. 

For a given k*n matrix B partition the set {ct^Cx) } ™ into disjoint 

sets 


{a n P u Oc), o. 12 P 12 (x), .... 


s 

r 


,a rl P rl (jt) ' a r2 P r2 (x) a rn P r „ <*» 


r r 



3 


where the S are defined by 

q 

\,V y,B) 5 VV y,B) lsi -3 in q 

a q/qj (y>B) f “u P ti <y ’ B) < "“ t 

For £ = 1, . . . , r let 

h - <y ‘ Rk l“u p u (y ' E) > “ki p ki <y ' B) ’ 


Z are disjoint open sets which cover R except for a set M of 
measure zero, 


For a given k*n matrix 

C write 

ij 

(y,s) 

for 

P^(y,B+sC) and 

h(s) 

for h(B+sC). That is, h(s) 

= / , max 

R i,J 

a. . 
ij 

(y,s)dy. 


Theorem 1: h is Gateaux differentiable at 

B 

if and 

only if for each 

Z 

such that * 0, = p £ j 

and .B^ ~ 

Zx 

V 

T 4T 

for 

each i,j < n^ 4 


Proof: By repeating some of 

the members of the 

S 's 

q 

if necessary, we 

can 


assume n-^ = n 2 = ••• ~ n r ~ n Q * Thus 

h(s) = C max max a P (y,s)dy 
1 l<j<n l<:i<r J J 

v/V ° 

R* 

J max f.(y,s)dy» 

R* 



4 


where f.(y,s) = max a.. P..(y,s) 

1 1 *1 i -t ^ ' 


lsi<r 




The f_,(y,s) have the properties: 

1 ) f-^y.O) = f 2 (y, 0 ) = ... = f. n (y, 0 ) 

O 

and 


2 ) 


ii 

3s 


(y, 0 ) is defined for all y i M, j = 1 , 


n^. By an argument in 


[3], it can be shown that for sufficiently small s , the difference quotients 


f j (y>s) - f. (y,o) 


are bounded by an integrable function g(y) for y i M. Hence, for s>0. 


h(s) - h(o) 
s 


= \ — {max f , 

-L S J*n J 


(y,s) - max f.(y,o)]dy 


=* ) 7 max (f (y , s) - f.(y,o) 

R k J 

J max — 

j<n 
J o 


]dy 


f . (y,s) - f (y , o) 


3f . 


f , max 3s 


(y,o)dy 


as s •+• 0+. On the other hand, for s < 0, 


l(s) - h(o) - f mnJ 

J k JS "o 


£ . (y,s) - f (y,o) 


dy 


3f . 

^ min ^r-(y>°)dy- 


R 


k ^ n c 
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as s ^ 0-. Thus the Gateaux differential h’(0) exists if and only if 


9f. 3f . 

max = min 9i J '^ y,0 ' ) 


jsn 




a.e. 


That is, if and only if 


3f, 

9s 


9f i 

■(y»°) = ^-(y,o) 


a.e. 


for all i,j < n Q . For y e JT it is readily verified that 


3f i 9P M 

(y - o) ■ 


3s 


Hence, h’(0) exists if and only if 


9P *i 9P ii 

a li“37' (y - o) * 


for i,j < n Q , almost all ye/, £ = 1, r . 

It is shown in [1], that 


3P 


£j. 


% ~9i~ (y ’ o) = a tj P Aj (y » 0){( y" By £j> (B hj E > 


T,-l 


[Cu «j + CE )l j b T CBE^ 3 b t ) - 1 ) ] 


-tr[CE j , J B T (BZ £j B T ) 2 ]}. 
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Since = Bp^, BS^B - Bl £i B , a £j “ Ai » 

8P 

+ CI )y B T <BE ) , 1 B T )- 1 <y - BMj.)] 

- tr[CZ £j B T (BS £i B T ) _1 ]}. 


If * 0, then R^ has positive measure. Thus it is easily seen 
that if * 0, 


a 


£i 9s 


(y >o) = a 


% j 9 s 


(y.o) 


a.e. in R„ 


if and only if Cp £ ^ = Cy £i> 
is Gateaux differentiable at 
V J ^ > S/i such that 

It is clear that if h 


CZ £ jB T - CZ £± B T for all i, j < Thus h 

B if and only if M u - My, ^E 1 - EyB T 
R^, * 0, This concludes the proof, 
is Gateaux differentiable at B, then 


6h(B:C) = i | 1 a il ^ SP^^j^Cy.BiCjdy 

R. 
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Thus the Gateaux differential of the probability of error is 


<5g(B:C) = -i 1 a 11 


Theorem 2: If h has a local maximum at B, then h is Gateaux differentia- 


l 


6P. 1 (y,B:C)dy. 


ble at B. 


Proof: It is evident from the proof of Theorem 1 that for any kxn matrix 


C, 


and 


llra sup h(B+sC) h(B) . 1±m 
S“>o S s-^0+ 


•i, 


max 
k j<n 


3f . 

-r^-(y,o)dy 


lim inf h(B+sC) - h j B) . lfm h(B+sC) - h(B) 

s s 

s*^o s+o- 


■I 


3f . 


min 7T— ^-(y ,-o)dy. 
j<n ds 

R k ° 

If h has a maximum at B, then since lim 

s^Q- 


h(BisC) - h(B) 


exists. 


lim sup 
s^o 


h(B-fsC) - h(B) 
s 


- lim 
s+o- 


h (B+sC) - h(B) 
s 


= lim inf 
s~>o 


h (B+sC) - h(B) 
s 
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Thus h is Gateaux differentiable at B. Q*E.D, 

3. Concluding Remarks. 

The meaning of the necessary and sufficient condition for differentiability 
of g(B) becomes a little more obvious when it is applied to the two population 
problem. Let and be normally distributed populations ‘ in R n with class 

statistics otp and Vj* ^2* respectively. 

Case 1: a i * T ^ en g(B) is differentiable for all B. 

Casp 2: a ]_ = a 2* ^1 ^ Then g is differentiable at B if and only 

if By x * Bu 2 or B^B 1 * B^B 1 . 

Case 3: = y 2> is invertible. Then, g is differentia- 

ble at B if and only if BE^B T * BI^bL 

Case 4: a i = a 2 * ^1 “ ^2’ ^1 ~ ^2 is not invert;i -kle. Then g is 

differentiable at B if and only if BE^ 1 * BE 2 B T or E.^ = E 2 B T . 

As a special case of Case 4, we have the degenerate case in which the 
class statistics for TT^ and tt 2 are the same. Then g is differentiable for 
all B and has derivative 0. Finally, we remark that it is mistakenly asserted 
in [3] that the condition c^P^y.B) ^ ajPXy.B) is necessary as well as 
sufficient for differentiability of g(B). As the analysis above shows, this 
is not even true in the two population probelra. 
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